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The use of digital technologies in agriculture has become very important to 
ensure the protection of trees from disease and limit their development, 
which leads to increased production, so the paper proposes a modified 
analytical model to analyze the data and graphical parts of the leaves of fruit 
trees using priority fuzzy C-means (PFCM). Based on the proposed distance 
scale to obtain a clustering with a less error rate and fairly close to accuracy 
for the purpose of monitoring the development of diseases of fruit trees, by 
classifying the diseases and medications needed for each disease, a database 
was created containing large samples of data and images, where the results 
of Analysis of previous studies that analyzes of large amounts of data give 
accurate results. The proposed method was used in smart gardens with large 
areas and we got the desired results. 
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1. INTRODUCTION 

Today, the agro-industrial complex lags far behind the world leaders in the use of digital 
technologies, as these technologies are used in agriculture in proportions of less than 2%, while in the 
countries of the European Union and the USA, the use of digital solutions is 80% and 60%, respectively. The 
results of studies conducted at the present time allow us to draw a disappointing conclusion that the 
population of our country is not sufficiently provided with horticultural products, both fresh and processed, 
due to the relevance of the issues of organizing import substitution in order to supply the population with 
competitive horticultural products in various forms, it is necessary to provide widespread use of digital 
solutions by agricultural producers, in particular, in horticulture. One of the key areas in the organization of 
digital agriculture is the development of horticulture, based on new principles of control, optimization and 
management decision-making. 

In industrial horticulture, very effective techniques for growing fruit crops are currently being used. 
However, it must be admitted that insufficient attention is paid to the issues of systemic intensification, which 
can provide a multiple increase in the economic efficiency of the use of lands occupied by orchards, difficult 
climatic conditions, as a rule, bring great risks to the organization of horticulture. In this regard, one of the 
fundamental elements of new technologies for regulating horticulture is the regulation of irrigation systems, 
the use of fertilizers and digital monitoring and control systems [1], [2]. The issues of irrigation and 
organization of drip irrigation are considered in the works [3]-[7]. 
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The widespread use of the concept of creating a smart garden provides the ability to conduct an 
operational analysis of soil and climatic conditions based on the processing of big data [8]-[11] obtained from 
sensors installed both in the vicinity of the root system of a single tree, and directly on its trunk, to carry out 
the application of fertilizers of various types depending on the characteristics of a particular tree, carry out 
preventive measures to combat pests and diseases of fruit plants in a smart garden, based on the analysis of 
graphic information, which is represented by images of tree leaves. At the same time, issues related to the 
organization and storage of large volumes of both data and graphical choice of processing graphic 
information are of particular importance, the analysis of which allows you to take prompt measures in the 
process of fruit growth, taking into account the state of fruit trees. To implement an intelligent system and 
predict the water regime of the soil, it is necessary to use mathematical modeling of moisture transfer 
processes. However, this system lacks the ability to process large amounts of data and graphic information, 
which makes it possible to identify characteristic classes [12]-[14], which is necessary for solving problems 
of analysis, forecasting and provides an increase in storage efficiency and increases the efficiency of 
managerial decision-making. Currently, there are a large number of methods that allow processing data used 
in information systems to implement the concept of a smart garden [15]-[17], however the resulting images 
from camera may have defects: defocusing, blurring, distortion, violation of brightness and contrast. In 
addition, the analysis of the soil data that we obtained from the sensors leads us to determine the percentage 
of moisture and drought to support decision-making in smart garden control systems. In this regard, the paper 
is aimed at building analytical models of information processing in information systems used within the 
concept of a smart garden. The modified model uses a different distance scale than the usual one that is used 
with segmentation methods with contribution criteria, all of which together lead to increased accuracy in the 
diagnosis of diseases and the organization of large amounts of data. 

The following is the outline of the paper. Section 2 of this paper presents the basic information and 
differences between measures of the distance between the points of the data set or image and the center of 
clusters in order to have a clear picture of the advantages and disadvantages of each measure. Section 3 
presents the proposed method for analyzing the data collected from the sensors and segmentation of the fruit 
tree leafes images. Section 4 describes the analysis of results related to the use of the modified analysis 
model. As for the last section, it includes the details of the conclusion of the work which presented in this 


paper. 


2. DISTANCE MEASURES 

The purpose of cluster analysis is to determine the stratification of the initial observations into 
clearly defined clusters-clusters that lie at a certain distance from each other, but do not break into parts that 
are equally distant from each other, which, in fact, is a grouping with the identification of natural 
stratification. Cluster analysis can be done in several ways. Each method is characterized by three features: a 
measure of the proximity of any two objects, a measure of the proximity of two groups of objects, and a rule 
for choosing the final version of the classification. The basis for clustering is the matrix of distances between 
objects. There are several ways to determine the distance between every two objects. 


2.1. Euclidean distance 
This seems to be the most common type of distance. It is simply a geometric distance in 
multidimensional space and is calculated as (1). 


dist (xy) = VXi(Xi — Yi)? () 


Note that the Euclidean distance (and its square) is calculated from the original data, not from the 
standardized data. This is the usual way of calculating it, which has certain advantages (for example, the 
distance between two objects does not change when a new object is introduced into the analysis, which may 
turn out to be an outlier). However, distances can be greatly affected by differences between the axes from 
which the distances are calculated. For example, if one of the axes is measured in centimeters, and then you 
convert it to millimeters (by multiplying the values by 10), then the final Euclidean distance (or the square of 
the Euclidean distance) calculated from the coordinates will change dramatically, and, as a result, the results 
of the cluster analysis can be very different from the previous ones. 


2.2. The square of the Euclidean distance 


Sometimes you may want to square the standard Euclidean distance to give more weight to more 
distant objects. This distance is calculated as (2). 
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dist (xy) = Li(X%; — ¥,)? (2) 


2.3. City block distance (Manhattan distance) 

This distance is simply the average of the differences over the coordinates. In most cases, this 
measure of distance leads to the same results as for the usual Euclid distance. However, note that for this 
measure the influence of individual large differences (outliers) decreases (because they are not squared). 
Manhattan distance is calculated using: 


dist (xy) = LilX; — Yil (3) 


2.4. Minkowski distance and Chebyshev distance 

A parametric metric on Euclidean space that can be thought of as a generalization of Euclidean 
distance and city block distance (Manhattan). 
p21 Minkowski distance 
p<1 is not a metric 
When p=co If P=0 {1/P= (infinity) will get the Chebyshev distance, this distance between two n-dimensional 
points or vectors is the maximum modulus of the difference in the coordinates of these points. The 
Chebyshev distance can be useful when one wishes to define two objects as "different" if they differ in any 
one coordinate (any one dimension). The Chebyshev distance is calculated by (4). 


dist (x,y) = Max|X; — Y,| (4) 


3. MODIFIED ANALYTICAL MODEL OF PRIORITY C-MEANS METHODS FOR 
PROCESSING DATA AND FRAGMENTS OF GRAPHIC INFORMATION 

Consider an analytical model that allows you to perform the process of fuzzy clustering, the 
construction of which takes into account both local and non-local information and their relative contribution 
[18]-[21]. In contrast to the standard approach based on the use of FCM, for fuzzy clustering in the proposed 
fuzzy clustering model, the definition of distance is changed in order to take into account the mutual 
influence of pixels during segmentation and reduce the noise of others. 

The optimization criterion, on the basis of which the clustering is carried out, has the form: 


Orem, C) = er Dini (uj)? (Xx, Ci) , (5) 
g = {0,1}. 


if g=0, to d— Chebyshev distance. 

if g=/, to d— Minkowski distance. 

Spatial membership function uj, 

Where P,;,— a priori probability that the k pixel is in the i cluster 


— NNilk) 

Py = Ma (6) 
where NN;(k) is the number of pixels in the vicinity of the k pixel, which are located in the i cluster after 
removing the fuzziness. N, is the total number of pixels in the neighborhood. d;,, is the distance between the i 
cluster and its z neighborhood. This makes it possible to calculate the center of each cluster c/using (7). 


rare Thar ((uhe) xx) 


i” wun /,s\nm_ 
‘ Xi=1(Uix) 


(7) 


The proposed fuzzy clustering model has a distinctive feature, which is that the distance measure [22] -[25] 
which should take into account both local and non-local information, is formalized in: 


yi (x), v;) — (1 —A;)d? (x;, v;) +A,d2,(x;, v;) (8) 


where d, (ij; v;) - distances, taking into account the influence of local information, and dé, (x, v;) - 
distances, taking into account the influence of non-local information; A,is a parameter ranging from 0 to 1, a 
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weight factor which is determined by the user when solving the problem and can be corrected when an 
unsatisfactory solution is obtained. A larger value is taken to make the contribution of the distance to the 
local information larger, and a smaller value is taken to reduce the contribution of the non-local information 
distance. The measure, for local information, is defined as: 


d?(x;,v;) = ime) _ , 


Lxpey, UX) 


where da? (xj, v;) is a measure determined by local information; 
| (ms x;)- pixel weight N;, 

The measure d,,, determined by non-local information is defined as a weighted sum over image 
pixels: 


din (x), v;) = Lxpen; Oni (XK Xj) A9 (Xp, Vi) (10) 


where w1 (Xz, x;) is the pixel weight in N; 

Membership for representative points should be as high as possible, and for non-representative 
points, as low as possible. An optimization criterion that satisfies these requirements is defined as shown in 
and looks like: 


minim (% Hc) = Dir DL wiz + Diam Dh — wiy)™} (11) 
d;; - is the distance between the j point and the i center of the cluster; 
Hi; ~ membership degree; 
m — degree of fuzziness; 


nj - positive number; 
c —number of clusters; 


1 
hi = —— (12) 


The value of 7; corresponds to the distance at which the value of the membership function of a point 
to a cluster is 0.5. The constructed analytical model allows you to fix 7; or change it at each iteration by 
changing dj and juj., which increases the resistance to noise when searching for valid clusters and 
determining their centers. Figure 1 show the working method of the proposed method for processing data and 
images of fruit tree leaves. 


Image information >® 
table 


Modified analytical 
model PFCM 


Fruit Trees 
Database 


Interface 
Interface 


Sensors data table 


Figure 1. Structural model analysis of the development of diseases of fruit trees 
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4. RESULTS AND DISCUSSION 

As a result of processing the experimental data of the drip irrigation system obtained during the 
performance of scientific research, a set of data was obtained processed using the clustering method. Three 
clusters were obtained corresponding to morning (M), afternoon (A) and evening (E) watering each one 
whith three factors: Pressure in the drip system (Z1), Drip line length (Z2 ), Water outlet diameter (Z3 ). 

Table 1 shows that trees are watered in the morning, afternoon and evening. Figure 2 exhabit we use 
different volumes of water. The lack of watering, as well as its excess, leads to the emergence of a number of 
diseases, the presence of which inevitably leads to a change in the surface of the leaves. 


Table 1. Data clustering depend on factors 
Factors Clusters 
M A E 
Zl 0.8 1.8 1.2 
Z2 120 150 180 
Z3 1.0 15 2.0 


Data clustering depend on factors 
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Figure 2. Clusters with different volumes of water 


We used the similarity coefficient, percentage of false positives and percentage of false negatives for 
analysis of fragments of the results presented in Table 2 allows us to conclude that the introduction of the 
Chebyshev and Minkowski metrics is reasonable and provides better results. 

Similarity coefficient shwo how similar clusters are to each other during clustering and when the 
value of this coefficient is large, it indicates the accuracy of the segmentation method. The value of a 
similarity relationship is measured by comparing the number of co-occurrence pixels with the number of 
individual pixels for each cluster in that relationship, where false positive (FP) shows how the pixel is 
incorrectly classified as not belonging to the cluster, but in fact belongs to cluster. A false negative (FN) 
means that a pixel is incorrectly classified as belonging to a cluster but actually does not. 


Table 2. Segmentation accuracy for various clustering algorithms 


Clustering algorithm Similarity Percentage of false Percentage of false 
positives negatives 

FCM method 85.23 21.25 7.50 

PFCM Method 91.38 11.76 4.39 

PFCM method with Chebyshev metric 93,8 11,60 3,20 

PFCM method with Minkowski metric 93,5 11,80 3,35 


5. CONCLUSION 
In this paper, we have increased the percentage of correct diagnosis of fruit tree diseases, monitor 
their development as well as improving the efficiency of fruit tree diseases diagnosis by using a modified 
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analytical model to process data and graphic information in the form of fruit tree leaf images allows us to 
conclude that the introduction of the Chebyshev and Minkowski metrics is reasonable and provides better 
results. An analysis of the results of paper showed that the proposed modified clustering method made it 
possible to reduce the percentage of false positives from 21.25 to 11.6, the percentage of false negatives from 
7.5 to 3.2. 
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